import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipd
from scipy.signal import fftconvolve
import librosa
The formula for convolution is as follows:
But this is slow! We want to be able to use the convolution/multiplication property of the DFTs so that we can leverage the Fast Fourier Transform to perform the convolution for us. This is what fftconvolve does. The result is computed very quickly!
sr = 44100
x, sr = librosa.load("aud_jessiesgirl.wav", sr=sr)
ipd.Audio(x, rate=sr)
sr = 44100
h, sr = librosa.load("imp_JFKTunnel.wav", sr=sr)
ipd.Audio(h, rate=sr)
y = fftconvolve(x, h)
ipd.Audio(y, rate=sr)
To use the DFT, we want to do the following:
Perform the DFT of x and of h to get $X[k]$ and $H[k]$, respectively
Compute $Y[k] = X[k]H[k]$ for all $k$
Compute the inverse Fourier transform of $Y[k]$ to get the convolved signal back in the time domain
But there are a few problems with this plan. First, note that the length of the convolution of $x$ and $h$ is actually $len(x)+len(h)-1$.
N = len(x)
M = len(h)
print("N = ", N)
print("M = ", M)
print("len(y) = ", len(y), M+N-1)
This means if we simply perform the inverse DFT of something with length $len(x)$, we won't have enough space to hold all of the echoes. Furthermore, when we derived the formula, we assumed that $x$ and $h$ were the same length, but this definitely is not true in general! What we need to do is zeropad both $x$ and $h$; that is, we create two new signals $xz$ and $hz$, each with $M+N-1$ samples. The first $len(x)$ samples of $xz$ are $x$, and the rest are zeros. Likewise, the first $len(h)$ samples of $hz$ are $h$, and the rest are zeros
xz = np.zeros(M+N-1)
xz[0:len(x)] = x
hz = np.zeros(M+N-1)
hz[0:len(h)] = h
We note that zeropadding doesn't do anything to the overall frequency pattern; it simply "interpolates" values in between. In other words, we end up with more frequency bins, but the new frequency bins are determined from the ones that are already there; no new information has been added
plt.subplot(211)
plt.plot(np.abs(np.fft.fft(h)))
plt.title("h")
plt.subplot(212)
plt.plot(np.abs(np.fft.fft(hz)))
plt.title("Zeropadded h")
plt.tight_layout()
Finally, since our zeropadded signals are of the same length that's long enough to hold the impulse response, we can finally implement our plan of performing multiplication in the frequency domain and going back to the time domain with an inverse fourier transform, and we get our convolution!
X = np.fft.fft(xz)
H = np.fft.fft(hz)
Y = X*H # Using the fact that convolution is multiplication in the frequency domain
y = np.fft.ifft(Y) # Perform the inverse DFT
y = np.real(y)
ipd.Audio(y, rate=sr)